61 research outputs found
Schema Vacuuming in Temporal Databases
Temporal databases facilitate the support of historical information by providing functions for indicating the intervals during which a tuple was applicable (along one or more temporal dimensions). Because data are never deleted, only superceded, temporal databases are inherently append-only resulting, over time, in a large historical sequence of database states. Data vacuuming in temporal databases allows for this sequence to be shortened by strategically, and irrevocably, deleting obsolete data. Schema versioning allows users to maintain a history of database schemata without compromising the semantics of the data or the ability to view data through historical schemata. While the techniques required for data vacuuming in temporal databases have been relatively well covered, the associated area of vacuuming schemata has received less attention. This paper discusses this issue and proposes a mechanism that fits well with existing methods for data vacuuming and schema versioning
Experiences in building a tool for navigating association rule result sets
Practical knowledge discovery is an iterative process.
First, the experiences gained from one mining run
are used to inform the parameter setting and the
dataset and attribute selection for subsequent runs.
Second, additional data, either incremental additions
to existing datasets or the inclusion of additional attributes
means that the mining process is reinvoked,
perhaps numerous times. Reducing the number of
iterations, improving the accuracy of parameter setting
and making the results of the mining run more
clearly understandable can thus significantly speed up
the discovery process.
In this paper we discuss our experiences in this
area and present a system that helps the user to
navigate through association rule result sets in a
way that makes it easier to find useful results from a
large result set. We present several techniques that
experience has shown us to be useful. The prototype
system – IRSetNav – is discussed, which has
capabilities in redundant rule reduction, subjective
interestingness evaluation, item and itemset pruning,
related information searching, text-based itemset
and rule visualisation, hierarchy based searching
and tracking changes between data sets using a
knowledge base. Techniques also discussed in the
paper, but not yet accommodated into IRSetNav,
include input schema selection, longitudinal ruleset
analysis and graphical visualisation techniques.Adelaide, S
Detecting anomalous longitudinal associations through higher order mining
The detection of unusual or anomalous data is an important
function in automated data analysis or data
mining. However, the diversity of anomaly detection
algorithms shows that it is often difficult to determine
which algorithms might detect anomalies given
any random dataset. In this paper we provide a partial
solution to this problem by elevating the search
for anomalous data in transaction-oriented datasets
to an inspection of the rules that can be produced
by higher order longitudinal/spatio-temporal association
rule mining. In this way we are able to apply
algorithms that may provide a view of anomalies that
is arguably closer to that sought by information analysts.Sydney, NS
SemGrAM - Integrating semantic graphs into association rule mining
To date, most association rule mining algorithms
have assumed that the domains of items are either
discrete or, in a limited number of cases, hierarchical,
categorical or linear. This constrains the search for
interesting rules to those that satisfy the specified
quality metrics as independent values or as higher
level concepts of those values. However, in many
cases the determination of a single hierarchy is not
practicable and, for many datasets, an item’s value
may be taken from a domain that is more conveniently
structured as a graph with weights indicating
semantic (or conceptual) distance. Research in the
development of algorithms that generate disjunctive
association rules has allowed the production of
rules such as Radios V TVs -> Cables. In many
cases there is little semantic relationship between
the disjunctive terms and arguably less readable
rules such as Radios V Tuesday -> Cables can
result. This paper describes two association rule
mining algorithms, SemGrAMG and SemGrAMP,
that accommodate conceptual distance information
contained in a semantic graph. The SemGrAM
algorithms permit the discovery of rules that include
an association between sets of cognate groups of
item values. The paper discusses the algorithms, the
design decisions made during their development and
some experimental results.Sydney, NS
A survey of temporal knowledge discovery paradigms and methods
With the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time-varying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining
Establishing a lineage for medical knowledge discovery
Medical science has a long history characterised by incidents of extraordinary insights that have
resulted in a paradigm shift in the methodologies
and approaches used and have moved the discipline
forward. While knowledge discovery has much to
offer medicine, it cannot be done in ignorance of
either this history or the norms of modern medical
investigation. This paper explores the lineage of
medical knowledge acquisition and discusses the
adverse perceptions that data mining techniques will
have to surmount to gain acceptance.Sydney, NS
On the impact of Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining are powerful automated data analysis tools and they are predicted to become the most frequently used analytical tools in the near future. The rapid dissemination of these technologies calls for an urgent examination of their social impact. This paper identifies social issues arising from Knowledge Discovery (KD) and Data Mining (DM). An overview of these technologies is presented, followed by a detailed discussion of each issue. The paper's intention is to primarily illustrate the cultural context of each issue and, secondly, to describe the impact of KD and DM in each case. Existing solutions specific to each issue are identified and examined for feasibility and effectiveness, and a solution that provides a suitably contextually sensitive means for gathering and analysing sensitive data is proposed and briefly outlined. The paper concludes with a discussion of topics for further consideration
- …